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A study compared a parallel distributed processing 
(PDP) model with a more traditional symbolic information processing 
model that accounts for early reading acquisition by human subjects. 
Two experimental paradigms were simulated. In one paradigm (a 
"savings*' paradigm) subjects were divided into two groups and trained 
with two sets of stimuli: consistent orthographic representation of 
voicing, or inconsistent mapping of orthography onto voicing. A total 
of 32 simulated trials were generated with each model. Analysis of 
the savings paradigm data for the symbolic model revealed no 
significant main or interaction effects. Analysis of the PDP data 
revealed significant main and interaction effects. A second paradigm 
involved a forced-choice task testing subjects' ability to make use 
of analytic generalization. A total of eight simulated trials were 
generated with each model. Analysis indicated no significant 
difference between the symbolic model and chance performance, while 
the PDP model had perfect performance, indicating that the network 
had learned to make use of the orthography-voicing relationship 
implicit in the stimuli. Findings suggest that, in the domain of 
early reading acquisition, the problem with the PDP model approach is 
not that it is too weak but that it is too strong — even the simplest 
PDP models exhibit learning beyond what is observed in human subjects 
faced with similar learning tasks. (Contains 18 references. An 
appendix of data is attached.) (RS) 
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Parallel Distributed Processing (PDP) models are central to a great deal of current 
research in the cognition of reading and are beginning to assert an influence well beyond the 
boundaries of the cognitive science research community. Adams (1990), for example, has 
grounded an argument for a specific approach to beginning reading instruction on a PDP 
model of word learning (Seidenberg & McClelland, 1989). Ehri (1992) has begun to adopt 
PDP concepts in explaining the development of sight word reading, and researchers with 
_ interests in learning difficulties (Seidenberg, 1992; Patterson, Seidenberg, & McClelland, 

§ 1989) have begun to model disabilities through simulated "lesioning" of working PDP 

systems. 

There is evidence, however, that the currently dominant PDP models of reading 
(McClelland & Rumelhart, 1981, 1988 pp. 203-239; Seidenberg & McClelland, 1989) are 
missing essential aspects of the cognition of early word perception and learning. One 
consistent general problem has been that the distributed representation of knowledge in PDP 
models is difficult to reconcile with the acquisition and use of distinct perceptual units (e.g. 
onsets and rimes) that appear to play an important role in learning to read (Treiman, 1992; 
Goswami, 1986, 1988). Another problem concerns the data sets used in testing the adequacy 
of PDP models which have relied almost exclusively on mature adult readers even in models 
that make explicitly developmental claims (e.g. The Seidenberg & McClelland (1989) 
developmental model cites 29 empirical studies only 1 of which employed children as 
subjects.) But neither of these limitations are theoretically critical in the sense that they rule 
out the PDP approach as an appropriate firamework for models of early literacy acquisition. 

Recent research in reading acquisition suggests, however, that a critical test of the 
adequacy of PDP models in explaining early reading acquisition may be at hand. As a 
consequence of a series of carefully controlled studies Byrne (1992) has identified what he 
calls the "default acquisition procedure" for reading. Briefly, these studies (Byrne, 1984, 
1992; Byrne & Carroll, 1989) investigated the way children learn to read new orthographies 
and found that, in the absence of explicit training in orthographic analysis, children adopt a 
non-analytic (i.e. paired associate) approach. Despite being provided with a completely 
regular sound-symbol system, the children did not induce phoneme-grapheme 
coiTespondences over extended periods of training. Moreover, in a series of related studies 
r exploring the learning of new ortiiographies by adult subjects the same default procedure was 

^ evident at a sub-phonemic level. Although adult subjects recognized the alphabetic character 

O of the orthographv at the syllable and word levels, they did pot di.scover the underlying 

^ system of regularity between tiie orUiography and sub-phonemic feature elements including 

^ the voicing contrast which has been shown to have high percept salience (Miller & Nicely. 



1955). 
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The present investigation reports on two simulations of the Byrne studies that suggest 
the PDP framework is incapable of accounting for the default acquisition procedure. The 
studies reported here compare a PDP processing model similar to the Seidenberg &, 
McClelland (1989) model (developed using the McClelland & Rumelhart (1988) BP 
simulation) with a more traditional symbolic information processing model (McEneaney, 
1991, 1992). Results of trials simulating the Byrne studies indicate that the symbolic model 
provides a more adequate treatment of early reading acquisition. Theoretical analysis 
suggests that the results of the simulations are not specific to either model parameters or 
learning rules employed in the simulations but that these differences constitute a genuinely 
critical test between a symbolic and a PDP approach in accounting for early readi?\g 
acquisition by human subjects. 

Two experimental paradigms were simulated in the present investigation. In one 
paradigm (labelled by Byrne & Carroll a "savings" paradigm) subjects were divided into two 
groups and trained with two sets of stimuli. One group of subjects was trained using stimulus 
sets within which voicing was consistently represented orthographically. The second group of 
subjects was trained with stimulus sets that employed an inconsistent mapping of orthography 
onto voicing. If subjects were using analytic learning it was reasoned that subjects in the 
consistent group would have an advantage over subjects in the inconsistent group. A second 
paradigm- involved a forced-choice task testing subjects' ability to make use of analytic 
generalization. Subjects who had learned letters in a new orthography were asked to identify 
novel stimuli that retained one critical orthographic marker (indicating voicing) from the 
original training set. If analytic learning had occurred, the one critical feature would have 
been enough to allow subjects to make the correct choice since the distractor in the forced- 
choice task did not include the correct voicing feattire. As noted above, both adults and 
children learning the new orthographies failed to exhibit analytic learning in either of the 
experimental tasks. The same was ncl true across the simulations carried out in the present 
investigation. 

The savings paradigm study began by training both models to a level of accuracy that 
ensured 100% correct recognition over at least three passes through the stimulus set used in 
the first episode of training. Since two different stimulus sets were employed, the set used in 
the first episode was counter balanced. When the models had achieved the performance 
criterion, a second episode of training was initiated that employed either a second consistent 
or a second inconsistent stimulus set. A total of 32 simulated trials were generated with each 
model. Data analysis involved a 2 (Episode) X 2 (Consistency) repeated measures design that 
employed trials to criterion as the dependent measure, a design identical to that reported by 
Byrne & Carroll (1989, p. 313). 

Analysis of the savings paradigm data for the symbolic mixlel revealed no signitlcant 
main (Fpp„^"= 0.3659, p - 0.55; Fc^,^,, - 1.2, p = 0.28) or interaction effects 
0 3659 p - 0.55). Analysis of the PDP data on the other hand revealed significant mam 

22.615, p < .0001; Fo^^ = 75.0721, p < .0001)and interaction effects (F,^^ 
- 60.1093, p < .0001) all of which were significant at p < .OOOi. Inspection of means lor 
the PDP data from the second episode of training revealed both an advantage for the 
consistent group and a disruption of learning for the inconsistent group in the second episode 
of training, indicating that the PDP model had learned analytically. 
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The forced choice simulations ^so began by training both models to a level of 
accuracy that ensured 100% correct recognition over three passes through a training set with 
a regular mapping of oithography onto 2 sub-phonemic features (place of articulation and 
voicing). When the learning criterion was achieved the models were presented with 8 new 
stimuli made up of a new orthographic feature along with the previously seen feature 
indicating either voiced or voiceless pronunciation. They were also presented with two new 
phonemes one voiced and the other unvoiced (e.g. /f/ and /v/) and required to select the 
phoneme represented by the orthographic stimulus. If analytic learning was occurring it was 
reasoned that performance on this forced-choice task would be significantly greater than 
chance (i.e. > 4/8 correct). 

A total of 8 simulated trials were generated with each model. Data analysis involved 
a t-test to determine whether a significant deviation from chance performance had occurred 
(as in experiment 1, Byrne, 1984). Analysis indicated no significant difference between the 
symbolic model and chance performance (T = 0.8864, p (1 -tailed) = .1972). The PDP 
model, however, had perfect performance (8/8 correct) across all 8 simulated trials indicating 
that the network had learned to make use of the orthography-voicing relationship implicit in 
the stimuli. 

Although the analysis indicates that the PDP model learns analytically it might be 
argued that this performance is an artifact of implementation specific parameters rather than a 
more general characteristic of the PDP framework. In the present case, however, it turns out 
that two characteristics of the training and testing data sets are more importont than the 
parameters of the model. One is that these data sets conform to the linear prexiictability 
constraint (any one feature can be predicted by a linear combination of the activation of the 
other features) which means these data sets are guaranteed to be leamable (McClelland & 
Rumelhart, 1986). The second characteristic is that the one-to-one mapping of orthographic 
and phonemic feature elements results in a an auto-associative learning task that is inevitably 
analytic. Even the simplest kinds of PDP networks (e.g. a 2-layer perceptron or a 1 -layer 
auto-associative net) demonstrate the same analytic learning following training. Altering 
parameters will change the rate at which learning occurs (or can prevent learning), but if 
learning does occur, it will be analytic . 

In the domain of early reading acquisition, therefore, the problem with the PDP 
approach is not that it is too weak (Rumelhart & McClelland, 1986) but that it is too sttong- 
Even the simplest PDP mtxlels exhibit learning beyond what we observe in human subjects 
faced with similar learning tasks. This, of course, does not eliminate PDP models from 
consideration at other stages of development but it does raise two interesting questions about 
the role of PDP models in the cognition of reading. If PDP systems become one of a set of 
components, what do the other components look like and how do the components interact (if 
at all)? Whatever the final verdict regarding the place of PDP models in explaining reading 
acquisition one thing seems clear: the results reported in this paper rule out PDP processing 
as an explanatory framework for reading acquisition in its earliest stages. 
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Factor Levels 
Episode 2 
Consistency 2 

Symbolic Model 
Source 

Consistency 

Episode 

Interaction 



Means and (standard deviations): 
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Savings Paradigm 

Episode factor is within subjects 
Consistency factor is across subjects 



DF 


F 


P 


1 


1.2000 


0.2820 


1 


0.36585 


0.5498 


1 


0.36585 


0.5498 



Consistency 

Consistent 

Inconsistent 



Trial 1 

4.000 

(0.7303) 

3.625 

(0.8062) 



Episode 

Trial 2 
4.000 
(0.9666) 
3.875 
(0.9574) 




Network Model 

Source DF F p 

Consistency 1 75.072 < 0.0001 

Episode 1 22.6105 < 0.0001 

Interaction 1 60.1093 < 0.0001 



Means and (standard deviations): 

Episode 



Consistency 

Consistent 
Inconsistent 



Trial 1 

153.312 

(5.48597) 

151.000 

(11.1535) 



Trial2 

138.500 

(11.7473) 

212.812 

(32.2701) 



Network Hodd 



5 160 
i :oo ' 



Epj>od« 



Population T Test 

Population 

Symbolic Model 
Network Model 



Forced-choice Paradigm 
Mean 

Correct Variance T 
4.00 1.69697 



4.33 
8.00 



.69697 



0 



0.88(>40.^ 
NA 



p (1 -tailed 



0.1972 
NA 
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